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Abstract 

Identifying how organismal attributes and environmental change affect lineage diversification is essential to our 
understanding of biodiversity. With the largest phylogeny yet compiled for grasses, we present an example of a key 
physiological innovation that promoted high diversification rates. C 4 photosynthesis, a complex suite of traits that improves 
photosynthetic efficiency under conditions of drought, high temperatures, and low atmospheric C0 2 , has evolved 
repeatedly in one lineage of grasses and was consistently associated with elevated diversification rates. In most cases there 
was a significant lag time between the origin of the pathway and subsequent radiations, suggesting that the 'C 4 effect' is 
complex and derives from the interplay of the C 4 syndrome with other factors. We also identified comparable radiations 
occurring during the same time period in C 3 Pooid grasses, a diverse, cold-adapted grassland lineage that has never evolved 
C 4 photosynthesis. The mid to late Miocene was an especially important period of both C 3 and C 4 grass diversification, 
coincident with the global development of extensive, open biomes in both warm and cool climates. As is likely true for most 
"key innovations", the C 4 effect is context dependent and only relevant within a particular organismal background and 
when particular ecological opportunities became available. 
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Introduction 

Within flowering plants, the grasses (Poaceae) are a remarkable 
clade, in terms of both species richness and ecological breadth. 
Comprising over 11,000 species, grasses are exceptionally diverse 
and a dominant feature of most open habitats throughout the 
world. Although many share a common morphological form, 
important physiological differences define various groups of 
grasses and act to sort these into environmental types. 

Grasses living in tropical and subtropical grassland or savanna 
systems almost exclusively utilize the C 4 photosynthetic pathway 
[1-3]. This trait is a complex modification over the ancestral C 3 
pathway that confers an advantage in open, hot, and dry 
conditions by concentrating C0 2 inside plant cells and preventing 
high levels of photorespiration [4] . C4 photosynthesis characterizes 
several ecologically dominant, species-rich lineages, suggesting that 
the C4 trait may also promote lineage diversification, via either a 
reduction in extinction rate, an increase in speciation rate, or a 
combination of both. In the past decade, molecular phylogenies 
have revealed the existence of three species-poor grass lineages 
successively sister to the rest of Poaceae and have placed the bulk 
of grass diversity in either the BEP or PACMAD clade (Figure 1) 
[5-10]. All of the 22-24 C 4 origins occur within the PACMAD 
clade, while the similarly sized BEP is entirely C3 [5-10]. This 
clustering of all C 4 origins in one of the two major grass lineages 
may be pardy due to increased evolutionary accessibility to the C 4 



trait in this clade, based on a shared set of leaf anatomical 
attributes [11]. 

In this study, we use phylogenetic comparative methods on 
large datasets to test for the effect of C 4 photosynthesis on 
diversification rates within grasses. While a densely sampled 
phylogeny of the entire grass lineage is central to accurately 
identifying shifts in diversification, most previous phylogenetic 
efforts have concentrated on relatively small subgroups, with the 
result that few markers are consistently sampled throughout the 
lineage, and many are difficult to align across distantly related taxa 
[[e.g. 12-20]]. Previous investigations of grass diversification rates 
have been hindered by this data structure and have included 
molecular data for less than 5% of grass diversity [21,22]. To 
incorporate as many species as possible without introducing large 
amounts of missing data into the sequence alignments, we 
constructed 14 separate phylogenies, corresponding to the main 
lineages inside grasses, and each built with a unique, optimal set of 
markers. Using a well-resolved backbone phylogeny [10], these 
were combined into a set of trees that contained 3,595 taxa 
(Figure 1), encompassing about 30% of the estimated diversity in 
Poaceae [23]. Using these phylogenies, we found a strong and 
significant effect of C 4 photosynthesis on diversification. We also 
explored these trees to identify shifts in diversification indepen- 
dently of any character state information, and interpret these 
analyses jointly, in the context of C 4 evolution and Miocene 
grassland expansion. 
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Figure 1 . Poaceae phylogeny with 3595 taxa. C 4 lineages are mapped in blue. Red labels indicate the PACMAD clade, yellow labels indicate the 
BEP clade, and grey labels indicate the early diverging Poaceae lineages. Lineage names are abbreviated as: O.P. Outlying Panicoideae, Ehrh. 
Ehrhartoideae, Ar.M. Arundinoideae+Micrairoideae, and Arist. Aristidoideae. 
doi:1 0.1 371 /journal.pone.0097722.g001 



Materials and Methods 

Sequence Mining and Matrix Assembly 

The majority of recent phylogenetic work in Poaceae has 
focused on specific subfamilies or genera and has employed a 
variety of fast-evolving chloroplast and nuclear markers (e.g. [12— 
20]). The nature of these studies has resulted in a wealth of 
sequence data for Poaceae, but many markers are both poorly 
sampled across the entire group and difficult to align across the 
entire clade. To circumvent the phylogenetic problems that arise 
from such data, specifically poor alignments, large amounts of 



missing sites, and large matrices ill-suited to computationally 
intensive analyses, we subdivided the tree-building approach. 
Fourteen sub-trees were constructed separately and subsequendy 
inserted into a fossil-calibrated backbone phylogeny. This 
approach relies heavily on recent work in the grasses that has 
resolved deep relationships among the subfamilies and clarified 
discrepancies in various molecular dating efforts [6-8,10,24]. 

Sequence data was collected from Genbank with the PHLAWD 
tool (http:/ /phlawd.net/ [25]) using the plant GenBank database 
generated in March 12, 2012. To avoid synonymy problems, all 
genus names were transformed to those accepted by the Kew 
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taxonomic database, using the GrassBase [23] synonymy database. 
Because the taxonomic classification in Genbank is not consistent 
with the latest developments in grass taxonomy, clades based on 
GenBank names are not always monophyletic. Species were, 
therefore, sorted into groups based on previous studies [10] and 
inspected on preliminary phylogenetic trees as necessary. In 
general, monophyletic groups were defined to correspond to 
traditionally recognized clades. The Bambusoideae, Ehrhartoi- 
deae, Chloridoideae, Danthonioideae, Andropogoneae, Paspaleae, 
and Paniceae were all used. The species-poor sister clades 
Arundinoideae and Micrairoideae were combined, as were the 
outlying Panicoideae sensu GPWGII 2012 [10]. The Pooideae was 
too large to analyze in one piece, so after marker selection, 3 
monophyletic clades were separated from the Pooideae backbone 
and each was analyzed individually. Two representatives of each 
separated clade were retained with the remaining backbone 
Pooideae so that their monophyly and divergence date could be 
constrained, and the separated lineage could be reinserted later. 
PHLAWD was then used to create alignments for the most 
frequently sampled gene regions in each of the 14 clades using a 
coverage threshold of 0.4 and an identity threshold of 0. 1 . The 
three plastid markers matK, ndhF, and rbcL were included in each 
group and an additional 2 to 10 gene regions were added 
depending on the group sampled (Table S2). In total, 35 gene 
regions were incorporated in the analysis (sampling information in 
Table SI, S2). 

Once the alignments were complete, the software trimAl [26] 
was used to remove sites with more than 70% missing data for 
each gene region and the MEGA software [27] was used to 
manually edit the alignment where necessary. In each group, the 
alignments were concatenated with Phyutility [28] and species 
names were checked against the GrassBase [23] synonymy 
database. A small number of names were referenced in Tropicos 
[29] but not in GrassBase [23], and were consequently considered 
to be recently described species. Synonyms, misspellings, subspe- 
cies, and varieties were manually removed whenever possible to 
leave a single representative sequence per accepted species. At this 
point, RAxML software [30] was used to build a tree with 20 
maximum likelihood searches, retaining the tree with the highest 
likelihood score across them. The phylogeny inferred for each 
group was manually inspected to identify taxa that had very long 
branches, representing potential errors. The sequences of these 
taxa were inspected by BLAST searches against GenBank, and 
putatively erroneous sequences, corresponding to either sequenc- 
ing or identification errors, were removed. 

Tree Building and Molecular Dating 

To estimate the age of the main grass lineages, dating analyses 
were first performed with a dataset of three previously sampled 
chloroplast genes and 543 taxa covering the entire grass family 
[10]. The software BEAST 1.7.2 [31] was run under a GTR+G+I 
substitution model, a Yule process for the prior distribution of 
node ages and a log-normal distribution for the prior on 
evolutionary rates among branches. Time-calibrated trees where 
obtained with two contrasting hypotheses for the placement of 
fossils [24]. Under calibration #1, which is based only on 
macrofossil calibrations and does not take into account fossil 
phytoliths whose placement is somewhat controversial [32], the 
crown age of the BEP-PACMAD clade followed a normal 
calibration density with a mean of 51.2 Ma and a standard 
deviation of 6.0 Ma [24]. Under calibration #2, which 
incorporates fossil phytoliths [32], the age of this same node 
followed a normal calibration density with a mean of 82.4 Ma and 
a standard deviation of 7.5 Ma [24]. In this second analysis, we 



also constrained the stem of Oryzeae to obtain dates compatible 
with phytolith fossil evidence [32], using an exponential distribu- 
tion with a mean of 10 Ma offset by 67 Ma. For these two analyses, 
the topology was not fixed, except for the monophyly of the 
ingroup (all taxa except Pharus). Trees were sampled every 5,000 
generations for 15,000,000 generations after a burn-in period of 
5,000,000 generations. Convergence, effective sample size, and the 
adequacy of the burn-in period were assessed using Tracer 1 .5 
[31]. 

A phylogeny was then inferred separately for each previously 
defined group of grasses using the software BEAST as described 
above [31]. Crown node ages were fixed (uniform prior with range 
of 0.01 around the fixed value) to the dates obtained from the 
Bayesian consensus phylogeny estimated from the 543-taxon 
dataset (above), under calibration #1. All trees were then scaled to 
match the dates under calibration #2. All subsequent analyses 
were performed on both sets of time-calibrated phylogenetic trees. 
The monophyly of the ingroup was enforced to ensure proper 
rooting. For each dataset, two independent Markov Chain Monte 
Carlo analyses were run for 10-50 million generations, sampling 
every 1000-5000 generations, depending on the size of the 
dataset. Convergence, effective sample size, and the adequacy of 
the burn-in period were assessed using Tracer [31]. A burn-in 
period of 2,500,000-6,000,000 generations was chosen, again 
depending on the size of the dataset. For clades of over 150 taxa, 
convergence from random starting trees was extremely slow, and 
so the best of our previous 20 maximum likelihood RAxML trees 
was dated using non-parametric rate smoothing in r8s [33] and 
used as a starting point for each run. 

For each group, the maximum clade credibility tree was selected 
with TreeAnnotator [3 1] and the node heights of this tree were 
scaled in R to match each of the dating hypotheses by multiplying 
all branch lengths by the fraction (hypothesis root age/ current root 
age). The calibrated phylogenetic trees were then manually 
inserted into the associated backbone phylogeny of 543 grasses 
[10], preserving the deep relationships among the groups and 
forming a set of all-inclusive, ultrametric phylogenies with 3595 
species each. With 544 genera represented, this tree contains more 
than 29% of the species and 71.2% of the recognized genera in 
Poaceae. Of the missing genera, only 6 have more than 10 species 
[23]. 

To take into account both phylogenetic uncertainty and 
variation in dating hypotheses, we repeated diversification analyses 
on 1 00 topologies drawn randomly from the population of trees 
sampled post burn-in by BEAST for each of our 14 groups. A 
unique, calibrated phylogeny for each group was scaled and added 
to each of our two backbone phylogenies to produce 100 
alternative phylogenies of the grasses under each set of dating 
conditions. 

Diversification Analyses 

Three approaches were used to analyze the patterns of 
diversification in Poaceae. First, the BiSSE (Binary State 
Speciation and Extinction) method [34,35] specifically evaluated 
the relationship between photosynthetic type and diversification 
rate. Second, log-scale species richness was compared among sister 
groups with different photosynthetic types using a Wilcoxon sign 
ranked test [36]. Third, turboMEDUSA, a likelihood method 
implemented in R [37], was used to locate and quantify shifts in 
diversification rates across Poaceae independendy of any character 
state information. Since all of the C 4 origins occur within the 
PACMAD portion of Poaceae, our focus is on this clade, although 
we also ran analyses across the entire phylogeny. 
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To effectively evaluate diversification patterns, it was necessary 
to determine the richness and distribution of Poaceae species on 
our phylogeny. Although taxonomic issues remain unsettled in 
certain areas of Poaceae phylogeny, we were able to approximate 
the size of most genera using the accepted names in GrassBase 
[23]. Unless otherwise demonstrated, genera were assumed to be 
monophyletic and occasionally small genera nested in larger ones 
were merged. For each genus, the species with the most sequence 
data was selected as the representative of that group and was 
assigned the richness of the entire genus. In genera with both C 3 
and C 4 taxa, we divided the genus into the minimum number of 
clades such that the C 3 taxa and the C 4 taxa were monophyletic 
and each represented by a single tip in our phylogeny. According 
to estimates from GrassBase [23], the phylogeny inferred in this 
study contains ~25% of the known Panicum species. This genus is, 
however, highly polyphyletic [38], and diverse sections have been 
segregated into new genera in the past few years [38-44] . To cope 
with this uncertainty, the number of Panicum species was equally 
spread among all Panicum tips in our phylogeny and the well- 
supported monophyletic groups of Panicum were subsequendy 
collapsed. Using this approach, we were able to assign 11,554 
species (95.5% of Poaceae) to a specific tip on our tree [23]. 

An additional difficulty lay in the potential tendency for large 
clades on short branches to throw off diversification estimates. 
Therefore, large clades of over 190 species were split among 
several representatives. The genus Poa, for instance, which 
contains over 550 species, was divided evenly among the tips 
corresponding to Poa pratensis, Poa annua, and Poa coknsoi. Even with 
similar subdivisions, excessively small state probabilities occasion- 
ally caused the BiSSE likelihood calculations to fail. In these cases, 
the groups were further subdivided among additional representa- 
tives or combined with a sister group to increase the subtending 
branch length. 

Using TurboMEDUSA, the number of shifts in diversification 
rates was first estimated with the default AICc threshold on our 
genus level tree (8.4547). Each representative tip was assigned the 
same richness value used for the BiSSE analyses. This approach 
suggested 24 shifts, some of which were located on extremely short 
branches leading to a single tip, with a relatively small number of 
species. These shifts were no longer identified with a more 
conservative threshold of 10.5, which suggested 18 shifts in 
diversification. These shifts were considered more reliable and are 
reported here. 

Pruning our large phylogeny down to single representatives of 
each genus allowed us to include information about unsampled 
diversity in our analyses, but it also reduced a substantial amount 
of branching structure and information. For example, using this 
approach with TurboMEDUSA precludes the identification of 
shifts that might occur closer to the tips, within genera for instance. 
We therefore performed a complimentary TurboMEDUSA 
analysis on the complete 3595 species tree. We also ran BiSSE 
analyses on the unpruned, 3595 tip tree, accommodating 
unsampled diversity by reporting our overall sampling frequency 
(0.2973 for Poaceae, and 0.2966 for PACMAD [23]) which BiSSE 
then used in calculations. 

Tree inference, dating analyses, and diversification analyses 
were conducted on the OSCAR HPC cluster at Brown University 
and the Louise HPC cluster at Yale University. Sequence 
matrices, trees, and character matrices have all been deposited 
on dataDryad doi:l 0.506 1 /dryad. 74b5d. 



Results 

BiSSE 

Our BiSSE analyses provided extremely strong support for the 
evolution of C 4 photosynthesis increasing diversification rates in 
grasses. All of the BiSSE tests on the BEAST maximum credibility 
tree strongly rejected the model of equal diversification rates for C3 
and C 4 taxa. This was irrespective of how we accommodated 
unsampled diversity, whether we analyzed PACMAD separately 
or together with all of Poaceae, and whether we calibrated our 
phylogeny with phytoliths or with less-controversial macrofossils 
(Table 1, Table S3). In most cases, the best-fitting model was a 6- 
parameter model in which both speciation and extinction rates 
were different for C 3 and C 4 taxa, but this was often only 
marginally better than models where C3 and C 4 lineages differed 
only in either speciation or extinction rates. Regardless, equal 
diversification rates were soundly rejected, and in all cases, C 4 
diversification rates were inferred to be higher. This C 4 effect can 
also clearly be seen in a linage-through- time plot (Figure SI). 

The replicate BiSSE analyses on 100 trees from the posterior 
distribution indicate that these results are also robust to 
phylogenetic uncertainty. Across 100 replicate PACMAD trees, 
equal diversification rates were strongly rejected (p<0.01) in all 
but one tree regardless of whether the missing diversity was 
distributed proportionally or by genus (Figure 2; Figure S2). 
Poaceae-wide trees similarly provided additional support, but only 
when analyses were performed using the full 3,595 taxon tree and 
missing species were distributed evenly across the tips (Figure S2). 
Lack of support in these trees when diversity was distributed by 
genus is probably due to the extremely high numbers of species in 
C3 BEP genera like Festuca, Poa, and Stipa which were each 
clustered at a few tips in the genus-level analyses. 

In the PACMAD clade, when the missing species diversity was 
distributed by genus, BiSSE estimated a net C 4 diversification rate 
of 0.1458 spp/my and a net C3 rate of 0.0951 spp/my in the 
maximum credibility tree under the macrofossil-dating hypothesis 
(Table 1). When the missing species diversity was instead 
distributed proportionally, both diversification rates were estimat- 
ed to be much higher (0.2407 spp/my for C 4 , 0.1677 spp/my for 
C 3 ) (Table S3). When the entire Poaceae tree was used, the 
estimated C 3 and C 4 rates were very similar to those identified 
when using the PACMAD tree alone, with significantly higher C 4 
rates of diversification (p<.01; Table S3). 

Under the phytolith-based dating calibration, the results from 
all analyses were consistent with those based on the macrofossil 
dates, with the obvious exception that actual net rates of 
diversification were estimated to be much lower, because grasses 
were inferred to be older. Similar contrasts between the C3 and C 4 
net diversification rates were evident, and models of equal 
diversification were rejected under the same conditions at similar 
levels of confidence (Table S3, Figure S2). 

Sister Group Comparisons 

Nearly half of the C 4 lineages in grasses are sister to groups that 
contain both C3 and C 4 taxa; however, 12 have exclusively C 3 
sister clades and could be compared directly (Table S4). Of these, 
the C 4 group is equally or more diverse in ten cases and the log- 
scaled species richness is significantly greater in C 4 groups 
(Wilcoxon sign ranked test p-value = 0.0067). While the indepen- 
dent C 4 lineages differ gready in both age and species richness, 
they are consistently more diverse than their C 3 sisters (Table S4). 
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turboMEDUSA 

The distinctiveness of several C 4 lineages was also highlighted 
by turboMEDUSA (Figure 3, Table S5). Under the macrofossil 
dating hypothesis, when the missing species diversity was 
distributed by genus, the inferred diversification rate was low in 
early diverging grass lineages (0.036 spp/my), increased in the 
common ancestor of the BEP and PACMAD clades (0.143 spp/ 
my and further in a derived clade (core Panicoideae) containing 
14-16 C 4 origins (0.220 spp/my; Figure 3). Within the PACMAD 
lineage, there were an additional five accelerations in diversifica- 
tion rate, four of which occurred within C4 clades, and the fifth 
occurring slighdy before two subsequent C 4 origins. 

In addition to these PACMAD radiations, turboMEDUSA also 
detected increases within the cold-adapted Pooideae grasses 
(Figure 3; Table S5). Although the fastest rate inferred for grasses 
was in a C 4 genus, Tripogon, several other exceptionally high rates 
were found in the C :5 Pooideae resulting in young and highly 
diverse taxa such as Agrostis, Poa, Elymus, and Festuca. These BEP 
clade radiations appeared to be concurrent with many of the 
warm-climate C 4 radiations (Figure 4). The alternative dating 
hypothesis based on phytoliths identified precisely the same shifts, 
but all of the rates were slower and the timing of the shifts was 
earlier (Table S5, Figure 4). 

The turboMEDUSA results from our 3595-tip tree (not 
including any missing species) were generally similar although 
the exact location of many of the shifts differed significandy (Table 
S6). In all, 12 shifts in diversification rate were identified in the full 
tree. There was a clear acceleration from a slow background rate 
(0.009 spp/my) into the BEP+PACMAD clade (0.164 spp/my), 
followed by a series of more recent increases. In the PACMAD, 
there were 5 accelerations, 1 of which occurred within the entirely 
C 4 lineage Andropogoneae, and 3 of which were nested within C 4 
genera (Muhlenbergia and Paspalum). The final PACMAD acceler- 
ation was at the base of the C3 Danthonioideae, a lineage known, 
like the Pooideae, for its prominence in cool-climate grasslands [9] . 
In the BEP clade, accelerations within the Bambusoideae and 
Pooideae were followed by even more rapid diversification in 
Festuca, the Stipeae, and within Arundinaria. 

Discussion 

C 4 photosynthesis promotes elevated diversification 
rates 

In this study, we have approached the effect of C 4 photosyn- 
thesis on diversification rates by employing a variety of statistical 
tests that evaluate diversification patterns in very different 
manners. The BiSSE analysis provides strong statistical evidence 
that photosynthetic type influences diversification and that across 
the entire Poaceae tree, C 4 lineages have radiated faster than C3 
lineages. This evidence is compelling, but it does not take into 
account variation in clade-specific diversification that may be 
unrelated to C 4 photosynthesis and entirely dependent on other 
factors (e.g. generation time, dispersal strategies). However, our 
sister group comparisons confirmed that C 4 lineages are statisti- 
cally more speciose than their respective C3 sister clades. This 
indicates that within any given lineage background, the evolution 
S c of C 4 photosynthesis tends to increase the number of descendant 

§ S 3 species. The "turboMEDUSA" approach estimates the position 

U 2 £ on the phylogeny where shifts in diversification rates occurred, 

I S ,J independent of any trait information [37]. These analyses located 

'Z, jj £ multiple accelerations in diversification rate across Poaceae, with 

5-56 many in C 4 PACMAD clades. Taken at face value, the location of 

m I ^ most of these identified shifts suggests that there is a significant 

delay between the origin of the C 4 pathway and subsequent C 4 
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Figure 2. Histograms of BiSSE model inferences based on 1 00 replicate PACMAD trees. Each tree had 1 774 taxa, and the missing diversity 
was represented as a proportion (sampling frequency). Black bars indicate C 4 rates, white bars indicate C 3 rates. The panels show: a. Net 
diversification rates derived from a 6-parameter model, b. Chi Squared values derived from ANOVA comparison of a 6-parameter model and a 4- 
parameter (equal diversification) model for each tree. The red lines indicate significance values of .05, .01, and .005. 
doi:1 0.1 371 /journal. pone.0097722.g002 



radiations. A similar pattern of delayed, Miocene shifts in C 4 
grasses was found in a previous diversification study using other 
methods [22]. Our results are also consistent with other "delayed 
rise" scenarios that seem to be a pattern identified in many groups 
of organisms [45-49]. 

C 4 photosynthesis is a physiological trait that has no obvious 
links with speciation, so how can it contribute to high diversifi- 
cation rates? The grass lineage is exceptionally diverse for its age, 
and some of the diversification rates we present here are among 
the highest reported in the literature under either calibration 
scheme (Table S5) [50,51]. The C 4 trait might have affected 
diversification by increasing speciation rates, decreasing extinction 
rates, or both. While our data do not statistically support one 
scenario over another, extinction rates are more consistently lower 
in C 4 lineages across a broad sampling of alternative phylogenetic 
trees, and lower extinction in C 4 lineages is generally favored, 
although not always strongly, by BiSSE (Table S3). We suggest 
that the elevated competitive ability of C 4 plants in hot, open 
environments [1] allowed newly formed species to survive in a 
range of environments, thus lowering extinction rates. Other grass- 
specific traits, such as their propensity for asexual reproduction, 
polyploidy, and long distance dispersal of seeds by wind might 
have acted as species-producing mechanisms in various clades 
throughout Poaceae [21,52]. Under this view, net C 4 diversifica- 
tion rates are higher, because fewer C 4 species generated by these 
life history mechanisms went extinct than did similarly generated 
C 3 species. The protracted spread of open systems might also have 
promoted diversification via a gradually growing patchwork of 
suitable habitats that allowed for repeated allopatric speciation 
events. In the tropics, the C 4 trait would have increased the 
likelihood of successful establishment in these open areas 
characterized by warm temperatures, moderate drought, and 
high radiation loads [9,53]. 

Environmental changes and diversification of C 4 grasses 

Interestingly, interpreting the context and environmental 
conditions in which the MEDUSA-identified shifts occurred 
depends heavily upon the tree-calibrations, and whether or not 
the recent phylogenetic placement of phytolith fossils is indeed 
accurate [32]. Under the macrofossil dating hypothesis, without 
phytolith evidence, C 4 origins appear almost entirely after the 
Oligocene atmospheric CO z drop (with the possible exception of 



the Chloridoideae) [54], and almost all are within the time period 
in which C 3 grasslands are believed to have existed on various 
continents [55-62,2,63]. Under this younger dating scenario, C 4 
photosynthesis may have evolved coincidently with movement into 
already established C 3 -dominated grasslands. Previous authors 
have found evidence that C 4 origins are correlated with shifts to 
drier, more open habitats, which is congruent with this scenario 
[9,64]. Under this timescale, all of our pinpointed C 4 radiations 
occurred between about 5 and 16 Ma, during the time period in 
which the fossil record indicates a rapid ecological spread of C 4 
grasses [65-70,2]. This dating scenario suggests concurrent 
diversification and rise to dominance of C 4 species, with rapid 
radiations occurring well after C 4 origins, in open, grassy biomes. 

The dating hypothesis based on phytolith evidence suggests 
slightly different drivers of both C 4 evolution and of diversification. 
In this older scenario, at least five C 4 origins predate the fossil 
record of widespread open systems. In this case, C 4 photosynthesis 
might still have evolved in high-radiation, open habitats, but these 
areas would have been rare and fragmented across the landscape. 
Even less intuitive is that some of these origins would have 
occurred when atmospheric C0 2 levels were very high, which 
would result in generally lower levels of photorespiration and a 
weaker selection pressure for evolution of the C 4 pathway. 
Regardless, in this case, both the C 4 and Pooid radiations would 
have begun around 15-25 Ma, during a time period that roughly 
coincides with the global appearance of open C 3 -dominated 
grasslands on various continents, but when C 4 species were rare 
[65-70,2]. It would suggest that C 4 species diversified rapidly 
before they became dominant features of the landscape. Only the 
shift in Tripogon could be coincident with the ecological spread of 
C 4 grasses. Interestingly, this second version of events shares some 
similarities to phytolith-based paleo-ecological reconstructions in 
North America, which suggest that substantial grass taxonomic 
diversity predated the late Miocene C 4 grassland expansion by 
23-27 Ma [60]. 

Both scenarios are reasonable, and although there is not enough 
confidence in the phylogenetic placement of phytolith fossils [32] 
to prefer the older timescale, we view the abundant grass phytolith 
record as a remarkable resource that promises to reveal much 
more about the timing of past events in grass history [60,61,71]. 
Regardless of the timeframe, we want to express the genuine un- 
certainty inherent in any of these point estimates of diversification 
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Figure 3. Simplified representation of shifts in diversification rates across Poaceae based on calibration #1. Darker shades of grey 
indicate higher rates of diversification. Red triangles indicate the approximate phylogenetic placement of C 4 lineages. The left point of each triangle 
corresponds to the stem age of the inferred shift. The transition from dark green to yellow across the bottom indicates the average timing of the rise 
of open, grassland habitats on different continents [2]. Rate shifts correspond to Table S5 and are labeled as follows: 1) background diversification 
rate, 2) BEP+PACMAD 3)Bambusoideae+Pooideae, 4) early diverging Pooideae, 5)Phaneospermateae, 6) Perrierbambus+Bonia clade 7) Poeae 2 clade, 
8) Poa+Alopecurus clade, 9) Agrostis+Calamagrostis clade, 10) Festuca, 1 1) Core Panicineae, 12) Andropogoneae+Paspaleae, 13) Sorghum+Andropogon 
clade, 14) Axonopus+Paspalum clade, 15) Poecilostachys, 16) Eragrostis clade, 17) Spartina clade, 18) Tripogon (Table S5). 
doi:1 0.1 371 /journal.pone.0097722.g003 



rate shifts. The turboMEDUSA analyses were extremely sensitive to 
many variables, and a slightly different taxon sampling, AICc 
threshold, branch-length distribution, or distribution of missing taxa 
would all lead to different nodes being identified — sometimes 
extremely different nodes (Table S6). This makes this sort of 
approach to diversification analyses fairly difficult to interpret. 



Historical patterns of diversification across such a large number of 
species over such a long period of time are necessarily highly 
complex and varied, and it seems unrealistic to assume that there 
are discrete, abrupt and identifiable shifts in lineage diversification 
through time. That said, it seems just as unrealistic to assign one 
diversification rate to C4 grasses and a second to G3 grasses. What 
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Figure 4. Timing of shifts in diversification rate across both dating hypotheses. Both grey and black rectangles indicate shifts in 
diversification rate bounded by the estimated stem and crown node ages for the branch where the shift occurred. Error bars are determined by the 
95% confidence interval for each age estimate. Each shift is numbered are corresponds to the shifts in both Figure 3 and Table S5. The black 
rectangles indicate accelerations that occurred within C 4 clades or immediately before C 4 origins. Orange dots indicate the crown node ages for each 
of the estimated 24 origins of C 4 photosynthesis in Poaceae. The red diamonds are the origins that are associated with subsequent rate shifts. The 
blue area indicates the time when grasslands are estimated to have arisen on various continents [55-62,2,63], and the green area is the time when C 3 
grasslands were replaced by C 4 grasses [65-70,2]. Overlap between the two is indicated by diagonal hatches. 
doi:1 0.1 371 /journal.pone.0097722.g004 



feels preferable in this latter case is that this is a more diffuse 
analysis, averaged across all C 4 and C 3 branches throughout the 
tree, and it is a direct hypothesis that can be accepted or rejected. 
The specific branches and dates of the shifts identified with 
turboMEDUSA will surely vary as more complete phylogenies are 
developed, but we suspect that the BiSSE results and sister group 
comparisons will prove to be quite robust. 

The effect of C 4 photosynthesis is context dependent 

Although our results indicate that C 4 photosynthesis increased 
diversification, its effect varies among lineages. Not every C 4 clade 
experienced higher rates of diversification, and when rate 
accelerations occurred, they were presumably long after the initial 
C 4 origin. The delay in C 4 grass diversification might be the result 
of dependence on the development of a series of other adaptations 
to dry, open landscapes before C 4 grasses became highly 
competitive. This corresponds to previous arguments made that 
C 4 photosynthesis is itself only one component in suites of 
characters that confer ecological success or dominance [2]. 
Perhaps in C 4 lineages, the right combination of traits for rapid 
diversification did not emerge until after considerable time had 
passed since the origin. Alternatively, it is well known that the C 4 
advantage is highly context dependent [1,72], and C 4 grasses 
might not have had the opportunity to diversify before open 
systems expanded in the Miocene. 

In addition to the PACMAD C 4 radiations, we identified a 
series of concurrent non-C 4 radiations in the BEP clade. These 
occurred mainly within the Pooideae, particularly in lineages with 
well-established cold climate tolerance and temperate zone 
diversity. Interestingly, none of the C3 accelerations occurred in 



lineages that occupy the same climate space as C 4 grasses. At 
higher latitudes, the drought and cold tolerant Pooideae could 
have conceivably exploited similarly expanding open biomes, but 
under a cooler climatic regime where C 4 photosynthesis is not 
adaptive (but see Still et al. 2014 for a new perspective on 
temperature tolerances of Pooid grasses [73]). This indicates that 
while not all diverse grass lineages are C 4 , it is primarily C 4 clades 
that are able to take advantage of the warm tropical, open biomes. 

Concluding thoughts 

In studies of lineage diversification and character traits, the case 
of C 4 photosynthesis is exceptional. There is an unusually high 
number (>66) of origins, providing a rare opportunity to test 
replicates and to identify the C 4 effect independent of other clade- 
specific adaptations [74,4] . Our sister group comparisons provide 
the best possible control for aspects of evolutionary history in 12 
cases, and the complementary BiSSE analysis integrates over all of 
the origins and all of the heterogeneity in the phylogeny. The 
evolutionary history of grasses is inextricably linked with climatic 
and ecosystem changes throughout the Miocene that resulted in 
the global rise of grasslands [75,2]. The evolution of C 4 
photosynthesis has long been recognized as an essential element 
of the ecological success of grasses in warm, open regions [1-3], 
and here we present compelling evidence that the C 4 pathway has 
also behaved as a "key innovation", promoting elevated rates of 
lineage diversification during the assembly of the world's grassy 
biomes. 
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Figure SI Histograms of BiSSE model inferences A-F. 

histograms from BiSSE analyses run on trees under dating 
hypothesis 1 (macrofossil), G.-N. are histograms from BiSSE 
analyses run on trees under dating hypothesis 2 (phytolith). All are 
based on the results from 100 replicated phylogenies with the 
missing species richness distributed either proportionally (sampling 
frequency) or as unresolved clades. A.-B. PACMAD, unresolved 
clades; C.-D. Poaceae sampling frequency; E.-F. Poaceae 
unresolved clades; G.-H. PACMAD sampling frequency; I.-J. 
PACMAD unresolved clades; K.-L. Poaceae sampling frequency; 
M.-N. Poaceae unresolved clades. 
(PDF) 

Figure S2 LTT time plot, using the 3,595 taxon tree. 

Showing the accumulation of a) all Poaceae species (black), b) C 4 

species (blue), and c) C 3 species through time (green). 

(PDF) 

Table SI The proportion of species represented by 
molecular data in our phylogeny. 

(DOC) 

Table S2 The number of taxa and genes used in 
phylgoenetic analyses for each subclade. 

(DOC) 

Table S3 Maximum credibility tree results for each set 
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(DOC) 
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